AITopics | quality data

Collaborating Authors

quality data

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Not Every AI Problem Is a Data Problem

Communications of the ACMSep-23-2025, 17:32:46 GMT

Membership in ACM includes a subscription to Communications of the ACM (CACM), the computing industry's most trusted source for staying connected to the world of advanced computing. Why we should be intentional about data scaling. Large language models (LLMs) have revolutionized the AI landscape, demonstrating remarkable capabilities across a wide range of tasks. Each new model seemingly reinforces the notion that modern transformer-based AI can conquer any challenge if armed with sufficient compute and data. However, while scaling has accelerated certain applications, such as robotics, it has yet to show significant impact in others, such as identifying misinformation.

artificial intelligence, data-driven scaling, quality data, (13 more...)

Communications of the ACM

Country:

Europe > Switzerland > Zürich > Zürich (0.14)
North America > United States > California > Santa Clara County > Mountain View (0.05)

Industry:

Information Technology (0.47)
Media > News (0.37)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Machine Translation (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Not Every AI Problem is a Data Problem: We Should Be Intentional About Data Scaling

Rodchenko, Tanya, Noy, Natasha, Scherrer, Nino, Prendki, Jennifer

arXiv.org Artificial IntelligenceJan-23-2025

For example, translation between languages exhibits regular and persistent patterns at different scales (across sentences, paragraphs, documents). In general, language patterns are stable over time. We know what type of data we need to expand to new languages. And while it may be challenging to acquire the data for rare or only spoken languages, it is easy to judge whether newly acquired data is what we need. In contrast, use cases where data lacks strong, persistent topological features or where the structure is highly fragmented or unstable over time, may not be as well-suited for data scaling approaches.

arxiv preprint arxiv, large language model, machine learning, (17 more...)

arXiv.org Artificial Intelligence

2501.13779

Genre: Research Report (0.41)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (0.70)
Information Technology > Artificial Intelligence > Natural Language > Machine Translation (0.70)

Add feedback

Novel Regression and Least Square Support Vector Machine Learning Technique for Air Pollution Forecasting

M, Dhanalakshmi, V, Radha

arXiv.org Artificial IntelligenceJun-11-2023

Air pollution is the origination of particulate matter, chemicals, or biological substances that brings pain to either humans or other living creatures or instigates discomfort to the natural habitat and the airspace. Hence, air pollution remains one of the paramount environmental issues as far as metropolitan cities are concerned. Several air pollution benchmarks are even said to have a negative influence on human health. Also, improper detection of air pollution benchmarks results in severe complications for humans and living creatures. To address this aspect, a novel technique called, Discretized Regression and Least Square Support Vector (DR-LSSV) based air pollution forecasting is proposed. The results indicate that the proposed DR-LSSV Technique can efficiently enhance air pollution forecasting performance and outperforms the conventional machine learning methods in terms of air pollution forecasting accuracy, air pollution forecasting time, and false positive rate.

air quality data, forecasting, quality data, (13 more...)

arXiv.org Artificial Intelligence

doi: 10.14445/22315381/IJETT-V71I4P214

2306.07301

Country:

Asia > India (0.29)
Asia > China > Beijing > Beijing (0.04)

Genre: Research Report > Promising Solution (0.34)

Industry:

Health & Medicine (1.00)
Law > Environmental Law (0.88)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Support Vector Machines (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

The Significance of Data Quality in Making a Successful Machine Learning Model - KDnuggets

#artificialintelligenceJan-15-2023, 19:30:20 GMT

AI has been a buzzword for quite some time now and is highly ubiquitous. The AI-enabled applications have extensively increased in the market. We have also been'blessed' with powerful infrastructure and advanced algorithms. However, that does not make the journey of taking your ML project to production any easy. The issue in data quality is not new, it has gained attention since the onset of machine learning (ML) applications.

artificial intelligence, data quality, successful machine learning model, (12 more...)

#artificialintelligence

Technology:

Information Technology > Data Science (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

7 Tips for Value-Driven AI

#artificialintelligenceNov-30-2022, 11:25:54 GMT

How your business can improve the skills of its talent to take greater advantage of AI. There's no doubt that artificial intelligence (AI) is changing the way business is done today. AI will ultimately transform every business in every industry. However, despite their desire to use data science when making decisions, many organizations can't find enough qualified data scientists to develop and run their data science initiatives. Nonetheless, with online training and readily available tools, any software engineer -- or even a business user with a math background -- can become a data scientist.

data scientist, software engineer, value-driven ai, (5 more...)

#artificialintelligence

Industry: Education > Educational Setting > Online (0.56)

Technology:

Information Technology > Data Science (1.00)
Information Technology > Artificial Intelligence > Machine Learning (0.38)

Add feedback

Discretized Linear Regression and Multiclass Support Vector Based Air Pollution Forecasting Technique

M, Dhanalakshmi, V, Radha

arXiv.org Artificial IntelligenceNov-28-2022

Air pollution is a vital issue emerging from the uncontrolled utilization of traditional energy sources as far as developing countries are concerned. Hence, ingenious air pollution forecasting methods are indispensable to minimize the risk. To that end, this paper proposes an Internet of Things (IoT) enabled system for monitoring and controlling air pollution in the cloud computing environment. A method called Linear Regression and Multiclass Support Vector (LR-MSV) IoT-based Air Pollution Forecast is proposed to monitor the air quality data and the air quality index measurement to pave the way for controlling effectively. Extensive experiments carried out on the air quality data in the India dataset have revealed the outstanding performance of the proposed LR-MSV method when benchmarked with well-established state-of-the-art methods. The results obtained by the LR-MSV method witness a significant increase in air pollution forecasting accuracy by reducing the air pollution forecasting time and error rate compared with the results produced by the other state-of-the-art methods

artificial intelligence, machine learning, quality data, (15 more...)

arXiv.org Artificial Intelligence

doi: 10.14445/22315381/IJETT-V70I11P234

2211.15095

Country:

Asia > India (0.26)
North America > United States > California (0.04)
Asia > China > Hong Kong (0.04)
Africa > Middle East > Egypt (0.04)

Genre: Research Report > Promising Solution (0.54)

Industry:

Materials > Chemicals (0.93)
Energy (0.88)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.89)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Support Vector Machines (0.73)

Add feedback

No, You're Not Alone. Google Is Also Making This Big Mistake On AI

#artificialintelligenceSep-10-2022, 17:56:00 GMT

Just this past month, an article was shared that showed that over 30% of the data used by Google for one of their shared machine learning models was mislabeled with the wrong data. Not only was the model itself full of errors, but the actual training data used by that model itself was full of mistakes. How could anyone using Google's model ever hope to trust the results if it's full of human-induced errors that computers can't fix. And Google isn't alone with major data mislabeling, an MIT study in 2021 found that almost 6% of the images in the industry-standard ImageNet database are mislabeled, and furthermore, found "label errors in the test sets of 10 of the most commonly-used computer vision, natural language, and audio datasets". How can we hope to trust or use these models if the data used to train those models is so bad?

ai project, boat, google, (14 more...)

#artificialintelligence

Technology:

Information Technology > Data Science > Data Quality (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

Data Centric Artificial Intelligence

#artificialintelligenceAug-28-2022, 23:55:27 GMT

The data-centric artificial intelligence is the modern approach to building AI systems using quality data. The data-centric AI prioritizes the quality of data over the quantity of data, while traditional model-centric AI does the opposite. The key is better data, not big data! The key idea of data-centric AI is to handle data the same way as handling high-quality materials when building a house i.e. spend relatively more time labelling, augmenting, managing and curating the data. The traditional way is to optimize the highly parameterized models using big data and achieve high performance.

data centric artificial intelligence, data-centric ai, iterative process, (11 more...)

#artificialintelligence

Technology:

Information Technology > Data Science > Data Mining (0.59)
Information Technology > Data Science > Data Quality (0.55)
Information Technology > Artificial Intelligence > Machine Learning (0.53)

Add feedback

AI and Open Data

#artificialintelligenceAug-11-2022, 01:20:33 GMT

We are excited to announce a new project, AI and Open Data: Open Data Needs for AI and International Development. Governments, researchers, and civil society tackling development problems in the global south continue to face challenges of data access and availability. Cutting edge analytical techniques, like artificial intelligence (AI) and machine learning (ML) are promising to increase the effectiveness of development initiatives, but still require quality data as inputs. Open data is still as important for sustainable development as ever. As a field, AI receives significant optimism for its potential impact on sustainable development, including its potential to improve agricultural practices and productivity through aerial and remote sensing, monitor disease outbreaks, and plan and manage energy grids.

ai and open data, global south, quality data, (4 more...)

#artificialintelligence

Country: North America > Canada (0.07)

Industry: Energy (0.59)

Technology: Information Technology > Artificial Intelligence > Machine Learning (0.59)

Add feedback

DialCrowd 2.0: A Quality-Focused Dialog System Crowdsourcing Toolkit

Huynh, Jessica, Chiang, Ting-Rui, Bigham, Jeffrey, Eskenazi, Maxine

arXiv.org Artificial IntelligenceJul-25-2022

Dialog system developers need high-quality data to train, fine-tune and assess their systems. They often use crowdsourcing for this since it provides large quantities of data from many workers. However, the data may not be of sufficiently good quality. This can be due to the way that the requester presents a task and how they interact with the workers. This paper introduces DialCrowd 2.0 to help requesters obtain higher quality data by, for example, presenting tasks more clearly and facilitating effective communication with workers. DialCrowd 2.0 guides developers in creating improved Human Intelligence Tasks (HITs) and is directly applicable to the workflows used currently by developers and researchers.

dialcrowd 2, instruction, requester, (13 more...)

arXiv.org Artificial Intelligence

2207.12551

Country: North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.14)

Genre: Research Report (1.00)

Technology:

Information Technology > Communications > Social Media > Crowdsourcing (0.90)
Information Technology > Artificial Intelligence > Natural Language > Discourse & Dialogue (0.85)

Add feedback